{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook we explore the `fuse` and `optimize_fusion` methods and the `OptimizationReport` class offered by `ranx`."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First of all we need to install [ranx](https://github.com/AmenRa/ranx)\n",
    "\n",
    "Mind that the first time you run any ranx' functions they may take a while as they must be compiled first"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -U ranx"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Download the data we need"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import requests\n",
    "\n",
    "for file in [\"qrels\", \"run_4\", \"run_5\"]:\n",
    "    os.makedirs(\"notebooks/data\", exist_ok=True)\n",
    "\n",
    "    with open(f\"notebooks/data/{file}.trec\", \"w\") as f:\n",
    "        master = f\"https://raw.githubusercontent.com/AmenRa/ranx/master/notebooks/data/{file}.trec\"\n",
    "        f.write(requests.get(master).text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Load data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ranx import Qrels, Run\n",
    "\n",
    "# Let's load qrels and runs from files\n",
    "qrels = Qrels.from_file(\"notebooks/data/qrels.trec\")\n",
    "\n",
    "run_4 = Run.from_file(\"notebooks/data/run_4.trec\")\n",
    "run_4.name = \"System A\"\n",
    "run_5 = Run.from_file(\"notebooks/data/run_5.trec\")\n",
    "run_5.name = \"System B\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fuse\n",
    "\n",
    "Here are reported all the fusion algorithms provided by `ranx`, along with their aliases.  \n",
    "The **Optim.** column indicates whether an algorithm require an optimization phase.\n",
    "\n",
    "| **Algorithm**                                              | **Alias** | **Optim.** | **Algorithm**                            | **Alias**   | **Optim.** |\n",
    "| ---------------------------------------------------------- | --------- | :--------: | ---------------------------------------- | ----------- | :--------: |\n",
    "| CombMIN                                         | min       |     No     | CombMAX                       | max         |     No     |\n",
    "| CombMED                                         | med       |     No     | CombSUM                       | sum         |     No     |\n",
    "| CombANZ                                         | anz       |     No     | CombMNZ                       | mnz         |     No     |\n",
    "| CombGMNZ                                       | gmnz      |     No     | ISR                               | isr         |     No     |\n",
    "| Log_ISR                                         | log_isr   |     No     | LogN_ISR                     | logn_isr    |    Yes     |\n",
    "| Reciprocal Rank Fusion (RRF) | rrf       |    Yes     | PosFuse                       | posfuse     |    Yes     |\n",
    "| ProbFuse                                       | probfuse  |    Yes     | SegFuse                       | segfuse     |    Yes     |\n",
    "| SlideFuse                                     | slidefuse |    Yes     | MAPFuse                       | mapfuse     |    Yes     |\n",
    "| BordaFuse                                     | bordafuse |     No     | Weighted BordaFuse | w_bordafuse |    Yes     |\n",
    "| Condorcet                                     | condorcet |     No     | Weighted Condorcet | w_condorcet |    Yes     |\n",
    "| BayesFuse                                     | bayesfuse |    Yes     | Mixed                           | mixed       |    Yes     |\n",
    "| WMNZ                                               | wmnz      |    Yes     | Weighted Sum               | wsum        |    Yes     |\n",
    "| Rank-Biased Centroids (RBC)   | rbc       |      yes      |                                          |             |\n",
    "\n",
    "Let's try some _unsupervised_ fusion algorithms!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ranx import fuse, evaluate\n",
    "\n",
    "print(run_4.name, evaluate(qrels, run_4, \"ndcg@100\"))\n",
    "print(run_5.name, evaluate(qrels, run_5, \"ndcg@100\"))\n",
    "\n",
    "for method in [\n",
    "    \"min\",  # Alias for CombMIN\n",
    "    \"max\",  # Alias for CombMAX\n",
    "    \"med\",  # Alias for CombMED\n",
    "    \"sum\",  # Alias for CombSUM\n",
    "    \"anz\",  # Alias for CombANZ\n",
    "    \"mnz\",  # Alias for CombMNZ\n",
    "]:\n",
    "    combined_run = fuse(\n",
    "        runs=[run_4, run_5],\n",
    "        norm=\"min-max\",  # Default normalization strategy\n",
    "        method=method,\n",
    "    )\n",
    "\n",
    "    print(combined_run.name, evaluate(qrels, combined_run, \"ndcg@100\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Normalization\n",
    "\n",
    "Let's try out other normalization strategies!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ranx import fuse, evaluate\n",
    "\n",
    "print(run_4.name, evaluate(qrels, run_4, \"ndcg@100\"))\n",
    "print(run_5.name, evaluate(qrels, run_5, \"ndcg@100\"))\n",
    "\n",
    "for norm in [\"min-max\", \"max\", \"sum\", \"zmuv\", \"rank\", \"borda\"]:\n",
    "    combined_run = fuse(\n",
    "        runs=[run_4, run_5],\n",
    "        norm=norm,\n",
    "        method=\"sum\",  # Alias for CombSUM\n",
    "    )\n",
    "\n",
    "    print(norm, evaluate(qrels, combined_run, \"ndcg@100\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optimize Fusion\n",
    "\n",
    "Let's try some fusion algorithm that requires optimization!\n",
    "\n",
    "WARNING: here we use the same runs for optimizing the algorithms and to get the final combination.  \n",
    "However, in a real-world scenario you should use non-test data for the optimization phase.  \n",
    "For example, rels and runs for the dev set or few hundreads/thousands of train samples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ranx import fuse, evaluate, optimize_fusion\n",
    "\n",
    "print(run_4.name, evaluate(qrels, run_4, \"ndcg@100\"))\n",
    "print(run_5.name, evaluate(qrels, run_5, \"ndcg@100\"))\n",
    "\n",
    "# Optimize a given fusion method\n",
    "best_params = optimize_fusion(\n",
    "    qrels=qrels,\n",
    "    runs=[run_4, run_5],\n",
    "    norm=\"min-max\",  # Default value\n",
    "    method=\"wsum\",  # Alias for Weighted Sum\n",
    "    metric=\"ndcg@100\",  # Metric we want to maximize\n",
    ")\n",
    "\n",
    "combined_run = fuse(\n",
    "    runs=[run_4, run_5],\n",
    "    norm=\"min-max\",  # Default value\n",
    "    method=\"wsum\",  # Alias for Weighted Sum\n",
    "    params=best_params,\n",
    ")\n",
    "\n",
    "print(combined_run.name, evaluate(qrels, combined_run, \"ndcg@100\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The hyper-parameter search space can be altered as in the next cell.  \n",
    "Please, refer to the official documentation for a complete list of the  \n",
    "search space parameters of each algorithm."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ranx import fuse, evaluate, optimize_fusion\n",
    "\n",
    "print(run_4.name, evaluate(qrels, run_4, \"ndcg@100\"))\n",
    "print(run_5.name, evaluate(qrels, run_5, \"ndcg@100\"))\n",
    "\n",
    "# Optimize a given fusion method\n",
    "best_params = optimize_fusion(\n",
    "    qrels=qrels,\n",
    "    runs=[run_4, run_5],\n",
    "    norm=\"min-max\",  # Default value\n",
    "    method=\"wsum\",  # Alias for Weighted Sum\n",
    "    metric=\"ndcg@100\",  # Metric we want to maximize\n",
    "    step=0.01,\n",
    ")\n",
    "\n",
    "combined_run = fuse(\n",
    "    runs=[run_4, run_5],\n",
    "    norm=\"min-max\",  # Default value\n",
    "    method=\"wsum\",  # Alias for Weighted Sum\n",
    "    params=best_params,\n",
    ")\n",
    "\n",
    "print(combined_run.name, evaluate(qrels, combined_run, \"ndcg@100\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `optimize_fusion` method can also return a report of all the evaluated configurations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ranx import fuse, evaluate, optimize_fusion\n",
    "\n",
    "best_params, optimization_report = optimize_fusion(\n",
    "    qrels=qrels,\n",
    "    runs=[run_4, run_5],\n",
    "    norm=\"min-max\",\n",
    "    method=\"wsum\",\n",
    "    metric=\"ndcg@100\",\n",
    "    return_optimization_report=True,\n",
    ")\n",
    "\n",
    "# The optimization results are saved in a OptimizationReport instance,\n",
    "# which provides handy functionalities such as tabular formatting\n",
    "optimization_report.to_table()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You can change the number of shown digits as follows\n",
    "optimization_report.rounding_digits = 4\n",
    "optimization_report.to_table()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You can show percentages instead of digits\n",
    "# Note that the number of shown digits is based on\n",
    "# the `rounding_digits` attribute, try changing it\n",
    "optimization_report.rounding_digits = 3\n",
    "optimization_report.show_percentages = True\n",
    "optimization_report.to_table()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# `rounding_digits` and `show_percentages` can be passed directly when\n",
    "# calling `optimize_fusion`\n",
    "best_params, optimization_report = optimize_fusion(\n",
    "    qrels=qrels,\n",
    "    runs=[run_4, run_5],\n",
    "    norm=\"min-max\",\n",
    "    method=\"wsum\",\n",
    "    metric=\"ndcg@100\",\n",
    "    return_optimization_report=True,\n",
    "    rounding_digits=4,\n",
    "    show_percentages=True,\n",
    ")\n",
    "\n",
    "optimization_report.to_table()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7.12 ('ranx')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "692976b4e7dc85f192e7ff7f01132f0a9cfec5b2147f9c392e0014b728af08ff"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
